An Automatic Chinese Document Revision System Using Bit and Character Mask Approach

نویسنده

  • June-Jei Kuo
چکیده

The errors in Chinese document are mainly caused in two stages input and editing. There are homonyms or homophones selection error, ambiguous pronunciation error, word segmentation error, similar shape character error, editing operation error and so on. In order to increase the quality of Chinese text, the conventional Chinese document revision system used the similar characters set and language model with some statistical date. Nevertheless, there are the following problems: (1) The perfect similar character set is difficult to make (2) Due to the copyright problem the large and balanced Chinese corpus is very difficult to be obtained (3) The above editing errors can not be solved simultaneously (4) The average success revision rate is not over 75%. In this paper we study the Chinese features and phonetic-input-to-character conversion system for Chinese. It is found that the Chinese phonetic information and the related conversion algorithm are much help to detect and revise the input errors in Chinese document. As to the editing errors, a special code structure of Chinese pronunciation which has only one bit difference among similar pronunciations is proposed In addition, the bits and characters mask technology is also proposed respectively. The experimental result of the proposed system show that the average success revision rate of the proposed system is close to 87%.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Automatic Workflow Generation and Modification by Enterprise Ontologies and Documents

This article presents a novel method and development paradigm that proposes a general template for an enterprise information structure and allows for the automatic generation and modification of enterprise workflows. This dynamically integrated workflow development approach utilises a conceptual ontology of domain processes and tasks, enterprise charts, and enterprise entities. It also suggests...

متن کامل

Automatic Workflow Generation and Modification by Enterprise Ontologies and Documents

This article presents a novel method and development paradigm that proposes a general template for an enterprise information structure and allows for the automatic generation and modification of enterprise workflows. This dynamically integrated workflow development approach utilises a conceptual ontology of domain processes and tasks, enterprise charts, and enterprise entities. It also suggests...

متن کامل

Okapi Chinese Text Retrieval Experiments at TREC-6

The focus of the Okapi TREC{6 Chinese experiments is on investigating the e ectiveness of di erent automatic indexing methods and phrase weighting for retrieval based on probabilistic models over Chinese text. We compare di erent probabilistic weighting methods based on a range of word and single character approaches. There are two indexing methods used in our experiments. One indexing method i...

متن کامل

A Prototype of Multi-Font Printed Chinese Character Reader

An approach to multi-font printed Chinese character recognition is proposed in this paper. The problems of inputting image of characters, preprocessing, character segmentati~n~feature extraction as well as character classification have been discussed. According to the characteristics of multi-font printed Chinese characters,the number of cutting across strokes, the external and internal areas w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998